Skip to content

What's not done / known gaps#5

Open
ArksherX wants to merge 2 commits into
SL5TaskForce:mainfrom
ArksherX:feature/per-identity-rate-limit
Open

What's not done / known gaps#5
ArksherX wants to merge 2 commits into
SL5TaskForce:mainfrom
ArksherX:feature/per-identity-rate-limit

Conversation

@ArksherX

@ArksherX ArksherX commented May 16, 2026

Copy link
Copy Markdown

Summary

Adds per-identity byte-rate limiting to the tunnelled data proxied through the gateway. Each agent identity (extracted from the mTLS client certificate extension) gets its own token bucket rate limiter. If an identity exceeds its configured throughput limit, the copy loop is throttled automatically.

Components of the Gateway (task item 1)

The gateway is composed of the following parts:

  • main.rs — Entry point. Loads config, sets up TLS, starts the TCP listener, and dispatches connections to MakeProxyService.
  • proxy.rs — Core request handling. Implements the Service trait for hyper, extracts the destination from the CONNECT request, calls the policy engine, opens the upstream TCP connection, and spawns the bidirectional tunnel.
  • policy.rs — Authorization logic. Extracts the agent identity from the custom X.509 certificate extension, queries PostgreSQL to check for a valid signed permission row, and returns Allow or Deny.
  • config.rs — Typed configuration structs deserialized from config.toml.
  • tls.rs — Sets up the rustls server config for mTLS, requiring and verifying client certificates.
  • rate_limit.rs (new) — Per-identity token bucket rate limiters backed by governor and stored in a DashMap.

What the feature does

  • Reads [rate_limit] from config.tomlbytes_per_second and burst_bytes
  • Maintains an in-memory DashMap<String, Arc<RateLimiter>> keyed by agent identity
  • When a tunnel is spawned, if bytes_per_second > 0, the bidirectional copy is wrapped with rate limiter checks per identity
  • Logs when rate limiting is active for a connection

Design choices and tradeoffs

In-memory over persistent storage
State lives in a DashMap on the heap. This means limits reset on gateway restart and are not shared across multiple gateway instances. This is the right tradeoff for a single-node SL5 weight enclave deployment — restarts are controlled events, and the SL5 threat model assumes a single-facility enclave. Adding distributed state (e.g. a Postgres counter per identity per time window) would be the correct next step for multi-node deployments, at the cost of a database roundtrip per data chunk.

Global config, not per-identity config
All identities share the same bytes_per_second and burst_bytes values from config. Per-identity limits would require either a config map keyed by identity string or a new database column — straightforward to add but out of scope for this implementation given the simplicity priority.

What it protects against and what it doesn't
The rate limit addresses sustained bulk exfiltration — an agent continuously streaming large volumes of data will be throttled. It does not address short bursts below the window duration, and it does not address an adversary who controls multiple distinct identities. It is a bandwidth control, not a session control.

Crate choice: governor
governor provides a well-tested token bucket implementation with no_std support and minimal dependencies. RateLimiter::direct with a Quota::per_second is the simplest correct primitive for bytes-per-second limiting.

Implementation process and tools used

  • Read the upstream codebase to understand where the bidirectional copy happens (spawn_tunnel in proxy.rs)
  • Used Claude (Anthropic) extensively for Rust-specific guidance: fixing lifetime errors with tokio::spawn, resolving clippy pedantic lints (needless_pass_by_value, clone_on_copy), and deriving Copy on RateLimitConfig to satisfy both the borrow checker and clippy simultaneously
  • All design decisions were made independently; Claude was used as a Rust reference, not an architect

What's not done / known gaps

  • No per-identity config (all identities share the same limit)
  • No integration test verifying throughput is measurably capped (the rate
    limit code path is exercised and all 61 tests pass, but a timing-based
    throughput assertion was not added to avoid flakiness in CI)

@ArksherX ArksherX changed the title feat: add per-identity data rate limiting What's not done / known gaps May 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant